{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "orig_nbformat": 2,
    "file_extension": ".py",
    "mimetype": "text/x-python",
    "name": "python",
    "npconvert_exporter": "python",
    "pygments_lexer": "ipython3",
    "version": 3,
    "colab": {
      "name": "code_switching_demo.ipynb",
      "provenance": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/Tomiinek/Multilingual_Text_to_Speech/blob/master/notebooks/code_switching_demo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "FfGXlx6WoB99"
      },
      "source": [
        "# Multilingual Text-to-Speech Demo\n",
        "\n",
        "This notebook demonstrates multilingual code-switching text-to-speech using:\n",
        "\n",
        "- Tacotron based spectrogram generation: https://github.com/Tomiinek/Multilingual_Text_to_Speech\n",
        "- WaveRNN vocoder: https://github.com/Tomiinek/WaveRNN, forked from fatchord/WaveRNN\n",
        "\n",
        "\n",
        "**Estimated time to complete**: 5 minutes\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "Zr9vniJDkkSs",
        "colab": {}
      },
      "source": [
        "import sys\n",
        "import os\n",
        "import IPython\n",
        "from IPython.display import Audio"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "UitOTnnVo8Ch"
      },
      "source": [
        "## Clone repositories"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "Bx_K6Ondoee8",
        "colab": {}
      },
      "source": [
        "os.chdir(os.path.expanduser(\"~\"))\n",
        "    \n",
        "tacotron_dir = \"Multilingual_Text_to_Speech\"\n",
        "if not os.path.exists(tacotron_dir):\n",
        "  ! git clone https://github.com/Tomiinek/Multilingual_Text_to_Speech # $tacotron_dir\n",
        "\n",
        "wavernn_dir = \"WaveRNN\"\n",
        "if not os.path.exists(wavernn_dir):\n",
        "  ! git clone https://github.com/Tomiinek/$wavernn_dir"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "uoiQKNlapBxd"
      },
      "source": [
        "## Download pre-trained models"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "hSfJk-_1teMc",
        "colab": {}
      },
      "source": [
        "! mkdir -p checkpoints\n",
        "os.chdir(os.path.join(os.path.expanduser(\"~\"), \"checkpoints\"))\n",
        "\n",
        "tacotron_chpt = \"generated_switching.pyt\"\n",
        "if not os.path.exists(os.path.join(os.path.expanduser(\"~\"), \"checkpoints\", tacotron_chpt)):\n",
        "  ! curl -O -L \"https://github.com/Tomiinek/Multilingual_Text_to_Speech/releases/download/v1.0/$tacotron_chpt\" \n",
        "\n",
        "wavernn_chpt = \"wavernn_weight.pyt\"\n",
        "if not os.path.exists(os.path.join(os.path.expanduser(\"~\"), \"checkpoints\", wavernn_chpt)):\n",
        "  ! curl -O -L \"https://github.com/Tomiinek/Multilingual_Text_to_Speech/releases/download/v1.0/$wavernn_chpt\"     \n",
        "\n",
        "os.chdir(os.path.expanduser(\"~\"))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "unEvny42pGTo"
      },
      "source": [
        "## Install dependencies"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "tBC1lNLuzBRj",
        "colab": {}
      },
      "source": [
        "! pip install -q -U soundfile\n",
        "! pip install -q -U phonemizer\n",
        "! pip install -q -U epitran"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ZG3bzFjsqMXE"
      },
      "source": [
        "## Input texts to be synthesized"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Lh_TNrZvzJLY"
      },
      "source": [
        "Inputs consist of **three parts delimited** by `|`:\n",
        "  - **Input utterance** - Only a basic normalization is applied to input utterances, so **you should not use obscure characters and punctuation**. See the examples below that are formatted properly.\n",
        "  - **Speaker ID** - There are more available speaker IDs, but **you should use just one of** `00-fr`, `00-de`, `00-nl`, `09-ru`, and `00-zh` as the WaveRNN vocoder was trained only on their voices.\n",
        "  - **Per-character language** specification - You have to provide a **list of language codes** (one of `de`, `fr`, `nl`, `ru`, `zh`) **with the number of their characters delimited by comma**, e.g., `l1-n1,l2-n2,l3` says that the language `l1` occupies `n1` characters from the beginning, the language `l2` takes next `n2` characters and the language `l3` has all the remaining characters to the end. **You can mix up more languages** to control accent by replacing language codes (such as `l1`) with `l1*w1:l2*w2`, which means that the language `l1` has the weight `w1` and `l2` has the weight `w2`. For example, `de*0.75:fr*0.25` combines German and French with more emphasis on German.\n",
        "\n",
        "Feel free to modify the examples below.\n",
        "\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "7AjhZ_Nx4qEY"
      },
      "source": [
        "**Run this to demonstrate code switching:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "62GS3ew4zCW_",
        "colab": {}
      },
      "source": [
        "inputs = [\n",
        "    \"Cette requête s'explique par les relations peu conventionnelles que Schrödinger entretient avec les femmes:|00-fr|fr-68,de-11,fr\",\n",
        "    \"Ces quartiers, parmi lesquels figurent De Pijp, le Kinkerbuurt et le Dapperbuurt, sont principalement financés par des banquiers et|00-fr|fr-39,nl-7,fr-5,nl-11,fr-7,nl-11,fr\",\n",
        "    \"Les romans de Фёдор Михайлович Достоевский sont parfois qualifiés de métaphysiques,|00-fr|fr-14,ru-28,fr\",\n",
        "    \"Le yǒngdìnghé est une rivière du nord de la Chine. Elle est l'un des tributaires du fleuve hǎihé.|00-fr|fr-3,zh-10,fr-78,zh-5,fr\",\n",
        "    \"François Hollande ist ein französischer Politiker der Sozialistischen Partei und war Staatspräsident der Französischen Republik.|00-de|fr-17,de\",\n",
        "    \"Sie liegt zwischen dem Ijsselmeer, der Ijssel und den Hügeln der Veluwe.|00-de|de-23,nl-10,de-6,nl-6,de-20,nl\",\n",
        "    \"Ключевская сопка erreicht ihre außerordentliche Höhe,|00-de|ru-16,de\",\n",
        "    \"Der tiānān ménguǎngcháng ist ein Platz im Zentrum von Peking, der Hauptstadt der Volksrepublik China.|00-de|de-4,zh-20,de\",\n",
        "    \"Als men langs deze laan loopt van de Brandenburger Tor tot aan de Alexanderplatz over de Schloßbrücke vanaf welke|00-nl|nl-37,de-17,nl-12,de-14,nl-9,de-12,nl\",\n",
        "    \"De naam van De Gaulle leeft voort in het grootste vliegveld van Frankrijk, Aéroport Charles De Gaulle.|00-nl|nl-12,fr-9,nl-54,fr-26,nl\",\n",
        "    \"Nog steeds wordt Александр Сергеевич Пушкин in de Russische wereld en daarbuiten vereerd en gelezen.|00-nl|nl-17,ru-26,nl\",\n",
        "    \"De chángjiāng stroomt vervolgens door zhòngqìng, de grootste stad van sìchuān.|00-nl|nl-3,zh-10,nl-25,zh-9,nl-23,zh-7,nl\",\n",
        "    \"При нём трудами Pöppelmanna и других придворных мастеров центр Dresdenа приобрёл знакомый облик в стиле барокко.|09-ru|ru-16,de-10,ru-37,de-7,ru\",\n",
        "    \"Как считают современные археологи, на месте Notre-Dame de Paris находились четыре различных храма:|09-ru|ru-44,fr-19,ru\",\n",
        "    \"Johannes Vermeer van Delft был известным экспертом по вопросам искусства.|09-ru|nl-26,ru\",\n",
        "    \"На протяжении истории běijīng был известен в Китае под разными именами.|09-ru|ru-22,zh-7,ru\",\n",
        "    \"tā de fùqīn Rudolf Schrödinger è shì shēngchǎn yóubù hé fángshǔibù de gōngchǎng zhǔ tóngshí yě shì yīmíng yuányìjiā。|00-zh|zh-12,de-18,zh\",\n",
        "    \"tāmen tóupiào juédìng jiànzào Sacré-Coeur， érqiě dìngyìtā wèi dùi bālígōngshè shèyuán suǒ fànxià de zùixíng de bǔcháng|00-zh|zh-30,fr-11,zh\",\n",
        "    \"Vincent van Gogh de wǔwèi bóbó shūshū men， yǒu sānwèi shì xiāngdāng chénggōng de yìshùpǐn jiāoyìshāng。|00-zh|nl-16,zh\",\n",
        "    \"yóuyú Александр shì dùi Кутузов huáiyǒu ègǎn， tā zài jūndùi de lǐngdǎozhíwù bèi zàicì chèxiāo。|00-zh|zh-6,ru-9,zh-9,ru-7,zh\"\n",
        "]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ZV80UbEJ4z_E"
      },
      "source": [
        "**Run this to demonstrate smooth pronunciation control:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "q7UpD1Wi-tpT",
        "colab": {}
      },
      "source": [
        "inputs = [\n",
        "    \"Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|fr\",\n",
        "    \"Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.1:fr*0.9\",\n",
        "    \"Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.2:fr*0.8\",\n",
        "    \"Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.3:fr*0.7\",\n",
        "    \"Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.4:fr*0.6\",\n",
        "    \"Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.5:fr*0.5\",\n",
        "    \"Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.6:fr*0.4\",\n",
        "    \"Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.7:fr*0.3\",\n",
        "    \"Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.8:fr*0.2\",\n",
        "    \"Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.9:fr*0.1\",\n",
        "    \"Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de\",\n",
        "]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "rwb29r1343Wd"
      },
      "source": [
        "**Run this to demonstrate voice cloning:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "GMcpymVg43rL",
        "colab": {}
      },
      "source": [
        "inputs = [\n",
        "    \"Der Distrikt liegt in den Kafueauen und ist von Landwirtschaft geprägt.|00-fr|de\",\n",
        "    \"Der Distrikt liegt in den Kafueauen und ist von Landwirtschaft geprägt.|00-de|de\",\n",
        "    \"Le texte complet de l'initiative peut être consulté sur le site de la Chancellerie fédérale.|00-fr|fr\",\n",
        "    \"Le texte complet de l'initiative peut être consulté sur le site de la Chancellerie fédérale.|00-de|fr\",\n",
        "    \"Dit wordt de start van Van Oostzanens carrière als zelfstandig kunstschilder.|00-fr|nl\",\n",
        "    \"Dit wordt de start van Van Oostzanens carrière als zelfstandig kunstschilder.|00-de|nl\",\n",
        "    \"Название штата произошло благодаря серии картографических ошибок и неточностей.|00-fr|ru\",\n",
        "    \"Название штата произошло благодаря серии картографических ошибок и неточностей.|00-de|ru\",\n",
        "    \"jìsuànjī dàxué zhǔyào xuékē shì kēxué hé jìzhúbù， xuéshēng kěyǐ huòqǔ jìsuànjīkēxué hé jìzhú de běnkē xuéwèi|00-fr|zh\",\n",
        "    \"jìsuànjī dàxué zhǔyào xuékē shì kēxué hé jìzhúbù， xuéshēng kěyǐ huòqǔ jìsuànjīkēxué hé jìzhú de běnkē xuéwèi|00-de|zh\"\n",
        "]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "cvI4YEqQqYB-"
      },
      "source": [
        "## Synthesis"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "0__VpmceqnEw"
      },
      "source": [
        "### Spectrogram generation"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "rzGCLiaZzOjT",
        "colab": {}
      },
      "source": [
        "os.chdir(os.path.join(os.path.expanduser(\"~\"), tacotron_dir))\n",
        "if \"utils\" in sys.modules: del sys.modules[\"utils\"]\n",
        "\n",
        "from synthesize import synthesize\n",
        "from utils import build_model\n",
        "\n",
        "model = build_model(os.path.join(os.path.expanduser(\"~\"), \"checkpoints\", tacotron_chpt))\n",
        "model.eval()\n",
        "\n",
        "spectrograms = [synthesize(model, \"|\" + i) for i in inputs]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "i1TNHEeTqbVT"
      },
      "source": [
        "### Waveform generation"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "kYJiruf5zPIa",
        "colab": {}
      },
      "source": [
        "os.chdir(os.path.join(os.path.expanduser(\"~\"), wavernn_dir))\n",
        "if \"utils\" in sys.modules: del sys.modules[\"utils\"]\n",
        "\n",
        "from models.fatchord_version import WaveRNN\n",
        "from utils import hparams as hp\n",
        "from gen_wavernn import generate\n",
        "import torch\n",
        "\n",
        "hp.configure('hparams.py')\n",
        "model = WaveRNN(rnn_dims=hp.voc_rnn_dims, fc_dims=hp.voc_fc_dims, bits=hp.bits, pad=hp.voc_pad, upsample_factors=hp.voc_upsample_factors, \n",
        "                feat_dims=hp.num_mels, compute_dims=hp.voc_compute_dims, res_out_dims=hp.voc_res_out_dims, res_blocks=hp.voc_res_blocks, \n",
        "                hop_length=hp.hop_length, sample_rate=hp.sample_rate, mode=hp.voc_mode).to(torch.device('cuda' if torch.cuda.is_available() else 'cpu'))\n",
        "model.load(os.path.join(os.path.expanduser(\"~\"), \"checkpoints\", wavernn_chpt))\n",
        "\n",
        "waveforms = [generate(model, s, hp.voc_gen_batched, hp.voc_target, hp.voc_overlap) for s in spectrograms]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "c8YNfGF3qf27"
      },
      "source": [
        "## Resulting audios\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "mq4xW6iOzReQ",
        "colab": {}
      },
      "source": [
        "for idx, w in enumerate(waveforms):\n",
        "  print(inputs[idx])\n",
        "  IPython.display.display(IPython.display.Audio(data=w, rate=hp.sample_rate))"
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}